20 research outputs found

    Onebox: Free-Text Interfaces as an Alternative to Complex Web Forms

    Get PDF
    This paper investigates the problem of translating free-text\ud queries into key-value pairs as an alternative means for searching `behind' web forms. We introduce a novel specication language for specifying free-text interfaces, and report the results of a user study where we evaluated our prototype in a travel planner scenario. Our results show that users prefer this free-text interface over the original web form and that they are about 9% faster on average at completing their search tasks

    Learning to merge search results for efficient Distributed Information Retrieval

    Get PDF
    Merging search results from different servers is a major problem in Distributed Information Retrieval. We used Regression-SVM and Ranking-SVM which would learn a function that merges results based on information that is readily available: i.e. the ranks, titles, summaries and URLs contained in the results pages. By not downloading additional information, such as the full document, we decrease bandwidth usage. CORI and Round Robin merging were used as our baselines; surprisingly, our results show that the SVM-methods do not improve over those baselines

    A probabilistic approach for mapping free-text queries to complex web forms

    Get PDF
    Web applications with complex interfaces consisting of multiple input fields should understand free-text queries. We propose a probabilistic approach to map parts of a free-text query to the fields of a complex web form. Our method uses token models rather than only static dictionaries to create this mapping, offering greater flexibility and requiring less domain knowledge than existing systems. We evaluate different implementations of our mapping model and show that our system effectively maps free-text queries without using a dictionary. If a dictionary is available, the performance increases and is significantly better than a rule-based baseline

    SearchResultFinder: federated search made easy

    Get PDF
    Building a federated search engine based on a large number existing web search engines is a challenge: implementing the programming interface (API) for each search engine is an exacting and time-consuming job. In this demonstration we present SearchResultFinder, a browser plugin which speeds up determining reusable XPaths for extracting search result items from HTML search result pages. Based on a single search result page, the tool presents a ranked list of candidate extraction XPaths and allows highlighting to view the extraction result. An evaluation with 148 web search engines shows that in 90% of the cases a correct XPath is suggested

    Deep web search: an overview and roadmap

    Get PDF
    We review the state-of-the-art in deep web search and propose a novel classification scheme to better compare deep web search systems. The current binary classification (surfacing versus virtual integration) hides a number of implicit decisions that must be made by a developer. We make these decisions explicit by distinguishing 7 system aspects that describe a system in terms of its functionality (what it can, and what it cannot do) and in terms of its solution to a specific problem. We then motivate the need for a search system which has a single-field free-text query interface that supports real-time structured search over multiple sources. To this end, we discuss two possible federated architectures and state the scientific challenges. Finally, we present the findings of our ongoing project and briefly outline related work to free-text interfaces over structured data

    Ranking XPaths for extracting search result records

    Get PDF
    Extracting search result records (SRRs) from webpages is useful for building an aggregated search engine which combines search results from a variety of search engines. Most automatic approaches to search result extraction are not portable: the complete process has to be rerun on a new search result page. In this paper we describe an algorithm to automatically determine XPath expressions to extract SRRs from webpages. Based on a single search result page, an XPath expression is determined which can be reused to extract SRRs from pages based on the same template. The algorithm is evaluated on a six datasets, including two new datasets containing a variety of web, image, video, shopping and news search results. The evaluation shows that for 85% of the tested search result pages, a useful XPath is determined. The algorithm is implemented as a browser plugin and as a standalone application which are available as open source software

    Distributed Deep Web Search

    Get PDF
    The World Wide Web contains billions of documents (and counting); hence, it is likely that some document will contain the answer or content you are searching for. While major search engines like Bing and Google often manage to return relevant results to your query, there are plenty of situations in which they are less capable of doing so. Specifically, there is a noticeable shortcoming in situations that involve the retrieval of data from the deep web. Deep web data is difficult to crawl and index for today’s web search engines, and this is largely due to the fact that the data must be accessed via complex web forms. However, deep web data can be highly relevant to the information-need of the end-user. This thesis overviews the problems, solutions, and paradigms for deep web search. Moreover, it proposes a new paradigm to overcome the apparent limitations in the current state of deep web search, and makes the following scientific contributions: 1. A more specific classification scheme for deep web search systems, to better illustrate the differences and variation between these systems. 2. Virtual surfacing, a new, and in our opinion better, deep web search paradigm which tries to combine the benefits of the two already existing paradigms, surfacing and virtual integration, and which also raises new research opportunities. 3. A stack decoding approach which combines rules and statistical usage information for interpreting the end-user’s free-text query, and to subsequently derive filled-out web forms based on that interpretation. 4. A practical comparison of the developed approach against a well-established text-processing toolkit. 5. Empirical evidence that, for a single site, end-users would rather use the proposed free-text search interface instead of a complex web form. Analysis of data obtained from user studies shows that the stack decoding approach works as well as, or better than, today’s top-performing alternatives
    corecore